Solving Hidden-Semi-Markov-Mode Markov Decision Problems
نویسندگان
چکیده
Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Processes (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chain. Like HM-MDPs, HS3MDPs form a subclass of Partially Observable Markov Decision Processes. Therefore, large instances of HS3MDPs (and HM-MDPs) can be solved using an online algorithm, the Partially Observable Monte Carlo Planning (POMCP) algorithm, based on Monte Carlo Tree Search exploiting particle filters for belief state approximation. We propose a first adaptation of POMCP to solve HS3MDPs more efficiently by exploiting their structure. Our empirical results show that the first adapted POMCP reaches higher cumulative rewards than the original algorithm. However, in larger instances, POMCP may run out of particles. To solve this issue, we propose a second adaptation of POMCP, replacing particle filters by exact representations of beliefs. Our empirical results indicate that this new version reaches high cumulative rewards faster than the former adapted POMCP and still remains efficient even for large problems.
منابع مشابه
An Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set
Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...
متن کاملHierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes
One general strategy for approximately solving large Markov decision processes is \divide-and-conquer": the original problem is decomposed into sub-problems which interact with each other, but yet can be solved independently by taking into account the nature of the interaction. In this paper we focus on a class of \policy-coupled" semi-Markov decision processes (SMDPs), which arise in many nons...
متن کاملSolving Hidden-Mode Markov Decision Problems
Hidden-Mode Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirect...
متن کاملSolving Generalized Semi-Markov Processes using Continuous Phase-Type Distributions
We introduce the generalized semi-Markov decision process (GSMDP) as an extension of continuous-time MDPs and semi-Markov decision processes (SMDPs) for modeling stochastic decision processes with asynchronous events and actions. Using phase-type distributions and uniformization, we show how an arbitrary GSMDP can be approximated by a discrete-time MDP, which can then be solved using existing M...
متن کامل